DCG Induction Using MDL and Pased Corpora
نویسنده
چکیده
منابع مشابه
DCG Induction using MDL and Parsed
We show how partial models of natural language syntax (manually written DCGs, with parameters estimated from a parsed corpus) can be automatically extended when trained upon raw text (using MDL). We also show how we can use a parsed corpus as an alternative constraint upon estimation. Empirical evaluation suggests that a parsed corpus is more informative than a MDL-based prior. However , best r...
متن کاملMDL-based DCG Induction for NP Identification
We introduce a learner capable of automatically extending large, manually written natural language Definite Clause Grammars with missing syntactic rules. It is based upon the Minimum Description Length principle , and can be trained upon either just raw text, or else raw text additionally annotated with parsed corpora. As a demonstration of the learner, we show how full Noun Phrases (NPs that m...
متن کاملConstructing semantic representations using the MDL principle
Words receive a signiicant part of their meaning from use in communicative settings. The formal mechanisms of lexical acquisition, as they apply to rich situational settings, may also be studied in the limited case of corpora of written texts. This work constitutes an approach to deriving semantic representations for lexemes using techniques from statistical induction. In particular, a number o...
متن کاملLogic Program Induction using MDL and MAP: An Application to Grammars
Probabilistic programs provide an appealing language for describing mental theories, because they are Turing complete: any computable process may be described as a program. Program induction is the problem of inferring theories, in the form of (probabilistic) programs, that describe some set of observations. Minimum Description Length, or MDL, is one common approach to program induction [11]. T...
متن کاملUnsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora
In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported res...
متن کامل